/*
Input: 	- raster_1km_ltw18_inter_area (from ArcGIS, contains an intersection of the raster cells (from rwi) and the 2018 precinct polygones and the area of each resulting entity)
		- n=1-7: WM_SUF_ohneText`n'.dta (real estate (flat rent) listing from rwi (GEO_RED_v3))

Output: election_rwi_red_sb_final.dta

Main task:
	> import geo data from ArcGIS with polygon-entities (intersected raster cells and precincts) and their area (all cells in Munich)
	> import real estate data from RWI (only flats rent)
	> select only listings in Munich
	> clean listings and calculate sq meter rents
	> calculate area weights from the areas of the geo data
	> assign with this weight listings to precincts (as raster cells might be split by precinct borders)
	> generate a balanced panel (precint year level)
	> merge listings to balanced panel
	(> imput precints without listings)
	

*/


* before in ARCGIS the shape file of the 2018 precincts was intersected with the raster data coming from rwi and the area of each entity was calculated

* PULL: data from ArcGIS
	import dbase using "$rawdata/RWI/02_data/tmp/precinct_raster_inter_area", clear 

* renmae id
* after the intersection only ids in Munich remain
	rename idm r1_id
	lab var r1_id "id for 1km raster"

*keep only id
	keep r1_id 
	
* keep a list of unique cell ids for Munich
	duplicates drop r1_id, force

*save tempfile
	tempfile munich_id1km
	save `munich_id1km'




*use the raw data coming from RWI (flats rent)
use "$rawdata/RWI/02_data/raw/RWI/GEO_RED_v3/Stata/WM_SUF_ohneText/WM_SUF_ohneText1.dta", clear

*append other seven files with the same information
	foreach n of num 2/8 {
	append using "$rawdata/RWI/02_data/raw/RWI/GEO_RED_v3/Stata/WM_SUF_ohneText/WM_SUF_ohneText`n'.dta"
	}

* keep only listings from munich
	merge m:1 r1_id using `munich_id1km', nogen keep(3)
	
* gen vector of 1s to get the number of listings by raster cell and year later	
	gen ones = 1

*clean data
*destring numeric values of interest
	destring mietekalt wohnflaeche, replace force
*drop unrealistic values
	drop if mietekalt < 100
*gen a price per square meter
	gen mprice = mietekalt/wohnflaeche
* drop missing prices	
	drop if mprice==.
	
* get average rent per raster cell and year
	collapse (sum) ones (mean) mprice, by(jahr r1_id)

*rename for merge
	rename jahr year

*save tempfile
	tempfile rwi_red_final_merger
	save `rwi_red_final_merger'



* import output from GIS
	import dbase using "$rawdata/RWI/02_data/tmp/precinct_raster_inter_area", clear 

* rename IDs and area
	rename idm r1_id
	lab var r1_id "id for 1km raster"
	rename LTW_SB_2_1 sb_new
	lab var sb_new "precinct in 2018"
	lab var F_AREA "entity area (calculated in ArcGIS)"
	
*keep IDs for raster cells and precincts and the area of all intersected entities
	keep r1_id sb_new F_AREA

* the area of a precinct is calculated here
	bys sb_new: egen sb_area = total(F_AREA)
* the weight comes from the area of the intersected entity and the precinct weight
	gen area_weight = F_AREA / sb_area

* get the data to the level of entities
	collapse (sum) area_weight, by(r1_id sb_new)

* expand for years 2007 and 2020 where there is real etate data
	expand 14 

*necessary list of ids (raster and precinct)
	bys r1_id sb_new: gen year = _n

	replace year = 2007 if year==1
	replace year = 2008 if year==2
	replace year = 2009 if year==3
	replace year = 2010 if year==4
	replace year = 2011 if year==5
	replace year = 2012 if year==6
	replace year = 2013 if year==7
	replace year = 2014 if year==8
	replace year = 2015 if year==9
	replace year = 2016 if year==10
	replace year = 2017 if year==11
	replace year = 2018 if year==12
	replace year = 2019 if year==13
	replace year = 2020 if year==14

* merge the information from the listings
	merge m:1 r1_id year using `rwi_red_final_merger', nogen assert(1 3) keep(3) // if some cell is missing in a year, _merge==1

* we want values for each precint and year. and we consider only listings by the weight their raster cell falls into the respective precinct
	collapse (sum) ones (mean) mprice [pw = area_weight], by(sb_new year)

*rename and label
	rename mprice mpreis_flats_rent
	lab var mpreis_flats_rent "average square meter rent for flats flats, source: RWI"

*drop temporary files
	drop ones

* changes variabes for merging to voter panel
	rename year jahr
	destring sb_new, replace

* keep only years in which elections are
	keep if jahr >=2013
	drop if jahr==2015 | jahr==2016

*save
*cf * using "$rawdata/RWI/02_data/final/election_rwi_red_sb_final"
	save "$rawdata/RWI/02_data/final/election_rwi_red_sb_final", replace

